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Abstract 

We study power-law correlations properties of the Google search queries for Dow Jones 
Industrial Average (DJIA) component stocks. Examining the daily data of the searched 
terms with a combination of the rescaled range and rescaled variance tests together with the 
detrended fluctuation analysis, we show that the searches are in fact power-law correlated 
with Hurst exponents between 0.8 and 1.1. The general interest in the DJIA stocks is 
thus strongly persistent. We further reinvestigate the cross-correlation structure between 
the searches, traded volume and volatility of the component stocks using the detrended 
cross-correlation and detrending moving-average cross-correlation coefficients. Contrary to 
the universal power-law correlations structure of the related Google searches, the results 
suggest that there is no universal relationship between the online search queries and the 
analyzed hnancial measures. Even though we confirm positive correlation for a majority of 
pairs, there are several pairs with insignihcant or even negative correlations. In addition, 
the correlations vary quite strongly across scales. 

Keywords: online searches, Google Trends, long-term memory, cross-correlations, 
volatility, traded volume 
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1. Introduction 

Analysis of online activity of the internet users has proved its worth in various dis¬ 
ciplines, most notably in psychology [D El El H E], ecology 0 El El E], epidemiology 
pJll [TT| ESI EHl E], medicine [IHl EE] linguistics [H], politology [TH|, sociology [IHl EE] 
and in a wide range of economics, marketing and hnance [2ll|22l|23lEllE5lE6lEZlEHl 
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EHl ISni EU |32l [33l EU ES]- in the economic and financial applications, the focus has been 
primarily put on the search queries on various search engines such as Google, Yahoo! and 
Baidu. Bank et al. |2S] find connection between Google searches and liquidity at the 
German stock market. Bordino et al. m study traded volume of the NASDAQ-100 index 
component stocks and they report that it is correlated with the related searches of the Ya¬ 
hoo! engine. Vlastakis & Markellos [28] find positive correlation between internet search 
queries for NASDAQ and NYSE stocks and their traded volume and volatility. Dzielinski 
[2n| introduces an uncertainty measure based on the financial online search queries. Preis 
et al. [21] show that Google searches for financial terms can be used for profitable trading 
strategies. Kristoufek [32] utilizes popularity of the Dow Jones stocks measured by Google 
search queries for portfolio diversification. Kristoufek [33] further studies dynamics be¬ 
tween Google searches, Wikipedia page views and dynamics of the Bitcoin crypto-currency 
uncovering a strong relationship between these. Moat et al. [22] report that even Wikipedia 
page views can be utilized for the trading strategy construction. And Gurme et al. [22] 
cluster the online searches into groups and show that mainly politics and business oriented 
searches are connected to the stock market movements. 

The most frequently reported relationship between the online searches, traded volume 
and volatility directs further to the dynamic characteristics of the online searches time 
series. As traded volume and volatility have been repeatedly studied for their power-law 
correlation structures [261 EH EE], the same research line is at hand for the online searches 
as well. Potential long-term memory of the online activity has further implications for 
modeling and correct inspection of dynamics between the searches and other series. Here, 
we examine the correlation structure of the Google searches related to the Dow Jones In¬ 
dustrial Average (DJIA) index components. Daily Google searches data are utilized for 
the components of DJIA and as such, we present the first such study of the correlation 
structure of the online searches. To do so, we apply the rescaled range and rescaled vari¬ 
ance tests to uncover the power-law correlations structure and we further proceed with 
the detrended fluctuation analysis of the search queries series. As it turns out that the 
DJIA-related Google queries are in fact power-law correlated, we reinvestigate a popular 
topic of cross-correlations between the searches, traded volume and volatility of the exam¬ 
ined stocks. As we find the online searches to be power-law correlated and on the edge 
of (non)stationarity, we utilize the newly proposed correlation coefficients based on the 
detrended cross-correlation and detrending moving-average cross-correlation analyses. 

The paper is organized as follows. In Section 2, we describe the used methodology, 
specifically the rescaled range and rescaled variance tests together with the moving block 
bootstrap significance criterion, and the detrended fluctuation analysis as well as the cor¬ 
relation coefficients. Section 3 introduces the dataset and presents the results. Section 
4 concludes. We show that the Google searches related to the DJIA component stocks 
show scaling characteristic for the power-law correlated processes. This is supported by 
all utilized methods. General interest in the publicly traded companies thus shares similar 
properties to the variance and traded volume series - there are profound periods of high 
interest followed by long-lived periods of low interest. However, the search series always 
revert back to a long-term trend so that no explosive behavior is observed. After taking 
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the long-term memory aspect of the online query series into consideration, the correlations 
between the searches, traded volume and volatility become quite unstable and no universal 
relationship is found. The initial long-term memory analysis thus proves to be crucial for 
a correct treatment of cross-correlations between the online searches and various possibly 
connected series. 


2. Methodology 


2.1. Long-term memory and its tests 

Long-term memory (or alternatively long-range dependence and long-range correla¬ 
tions) is dehned through a power-law decay of the auto-correlation function p{k) which 
scales as p{k) oc k‘^^~‘^ for lag k —-foo [39l HD], HI]. The series are then referred to as 
the power-law (auto-)correlated processes as well. The characteristic parameter of such 
processes is Hurst exponent H, or alternatively parameter a, which takes values between 
0 and 1 for stationary processes. The breaking value of H = 0.5 characterizes a process 
with no long-term memory. Processes with H > 0.5 are usually referred to as persistent 
processes whereas the ones with H < 0.5 as anti-persistent processes. The former ones 
are reminiscent of locally trending processes which, however, keep their stationarity (for 
H < 1) and return to their mean value quickly enough. The latter ones are very erratic in 
behavior as they switch their direction more frequently than uncorrelated processes. Inte¬ 
grating the stationary long-range dependent processes once creates an additional category 
of processes which have interesting properties. For 1 < iL < 1.5, we have non-stationary 
yet still mean reverting processes. The frontier of iL = 1.5 marks a unit root process 
and H > 1.5 characterizes processes which are non-stationary and not mean reverting, 
i.e. explosive processes. The long-term memory property of time series has far-reaching 
consequences for the time series modeling and forecasting mainly due to its implication 
of a non-summable auto-correlation function jTT]. Therefore, it is essential to distinguish 
between long-range dependence with its power-law correlations and short-range depen¬ 
dence with its exponential correlation structure. For this purpose, we utilize the modihed 
rescaled range test and the rescaled variance test. 

The modihed rescaled range test is an adjusted version of the original rescaled range 
analysis [39] . Both methods are based on scaling of the rescaled ranges with an increasing 
time series length. For the time series {xt\ with t = 1, 2,..., T, the testing statistic Vt is 
dehned as 


where i? is a range of the prohle of the analyzed series. 



R 


max 


5^ (Xi - x) 


. 2 = 1 


min 

t=l,...,T 






3 



with X being the time series average, S' is a heteroskedasticity and antocorrelation consistent 
(HAC) estimator of the standard deviation of the original series, dehned as 


^2 = 7(0) + 2 ("l - 7(fc), 

fc=i k J 


( 1 ) 


with 7 (fc) being an estimated auto-correlation with lag k using the Barlett-kernel weights. 
Note that 7 ( 0 ) is an estimated variance. The crucial difference between the original and 
the modified version of the test stems in Eq. which is constructed to control for a possible 
short-term memory bias. Selection of the parameter q then becomes crucial as an overshot 
parameter q can suppress even long-term memory whereas an undershot q parameter can 
direct to a misleadingly found long-term memory which in fact is only a strong short-term 
memory. We stick to an automatic selection criterion of the parameter as proposed by Lo 
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^ [UJ Vi-p(i)V 

where p(l) is the sample hrst order autocorrelation and [J is the lower integer operator. 

The rescaled variance test |13] is based on a very similar idea as the previous one but, 
as the name suggests, it is based on the prohle variance rather than the prohle range so 
that it is less sensitive to extreme values. The testing statistic Mj- is then dehned as 


Mx = 


var(A:) 

TS^ 


where var(X) is the variance of the prohle of the original series. To control for the short¬ 
term memory bias, the HAC standard deviation from Eq. and the optimal q* from Eq. 
[Hare used here as well. 

Even though both Vt and Mt have well dehned asymptotic critical values 02103] , we 
opt for an alternative approach utilizing the moving block bootstrap methodology 05] 
due to a hnite sample, a very heterogenous dynamics of the analyzed series as well as 
their distributional properties. In the procedure, surrogate series are formed by shuffling 
the blocks of a hxed size from the original series. This way, the short-term correlations 
and distributional properties are kept but the long-term correlations are shuffled away 
creating a distribution of the testing statistic under a more realistic null hypothesis. In 
our application, we hx the block size to 25 observations and we bootstrap 1000 surrogate 
series to obtain statistical signihcance. 


2.2. Detrended fluctuation analysis 

Detrended huctuation analysis (DFA) [l 6 l 071 08] is the most popular and the most 
frequently applied time domain estimator of Hurst exponent. This is mainly due to the fact 
that DFA works under various settings such as non-stationarity and trends jlS], periodic 
cycles and seasonalities |19], and heavy tails [50] . 
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The procedure is based on the following steps. We work with the prohle X{t) of the 
series {xt} with t = 1,... ,T dehned as 

t 

X{t) = '^{xi-x). 
i=l 

The prohle is divided into Tg = \T / sj non-overlapping windows of length s which is referred 
to as a scale. The time series length T may be non-divisible by s which creates an issue 
with the end of the series which would not be used in the procedure. For this purpose, the 
series is in addition divided into boxes from the end of the series so that we obtain 2Ts 
boxes of size s. In each of these boxes, we calculate a mean squared deviation from the 
linear time trend inside the box. This means that for the jth box of size s, we obtain 

1E (•’f -i]+i)-^)r 

i=l 

where Xj{i) is a linear £t of a time trend at position i in window j. In a similar manner, 
we obtain the fluctuation for the boxes formed from the end of the series as 

FHi, s) = ^ E (V(r - s|i - T,] + 


We then construct a fluctuation for specihc scale s as 


F{s) 



2Ts 


i=i 




and finally, we obtain Hurst exponent via the scaling law 

F{s) oc . 


(3) 


In the application, we estimate the exponent for scales between Smin = 10 and Smax = 
500 ~ T/5. Moreover for better illustrational purposes, we base the estimation and the 
results on scales s which are powers of 10 to a single decimal point. Note, however, that 
the results do not change qualitatively for other specihcations of scales and box splitting 
procedures and such approach is thus kept primarily for a straightforward presentation of 
the results. 


2.3. DCCA and DMCA coefficients 

The detrended cross-correlation coefficient Pdcca{s) for scale s as proposed by Zebende 
[51] is a combination of the detrended fluctuation analysis (DFA) |l6l HT] EH] and the 
detrended cross-correlation analysis (DCCA) |H21IMIIS1|- The coefficient is dehned as 

/ N ^DCCAi^) 

^ iWTFWh’ 
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where F'^ccAi.^) is a detrended covariance between profiles of series {x*} and {yt\ based 
on a window of size s, and and Fppy^ y are detrended variances of profiles of the 

separate series, respectively, for a window size ^ For time series of length T, the series 
is divided into non-overlapping boxes of length s. In each box, fluctuation functions are 
computed for linearly detrended series which are in turn averaged over all boxes of the same 
length. In the case when T is not divisible by s, the series is divided from the beginning 
as well as from the end and the averages are based on these sub-periods as in the case of 
DFA. More details about the methods and some alternative specihcations can be found in 


Refs. [Ml E2l ESI ESI EH ESI EH] • 

The detrending moving-average cross-correlation coefficient PdmcaW for scale A has 
been introduced by Kristoufek [60] as an alternative to the DCCA coefficient. The method 
builds on a connection between the detrending moving average (DMA) procedure [611 [62] 
and detrending moving-average cross-correlation analysis (DMCA) [SSI ES] • The coefficient 
is dehned as 


Pdmca{F) — 


^dmcaW 


Fx,DMAi^)Fy^DMA{^) ’ 


where T’|,^^^^(A) and are a detrended covariance between prohles 

of the examined series and detrended variances of the separate series, respectively, with 
a moving average parameter A. Fluctuation functions are based on series detrended by 
a centered moving average of length A. Various specihcations can be utilized for the 
detrending but the centered averaging has been shown to outperform the contenders [SS] . 
Contrary to the DCCA coefficient, the DMCA coefficient is not based on a box-splitting and 
it is thus computationally more efficient. More details can be found in Refs. [601 ES, l63] . 

In a series of papers, Kristoufek [55] [60] shows that the statistical properties of both 
methods depend strongly on long-term memory properties of the separate series. More¬ 
over, reliability of the methods is not constant for different levels of correlation between 
the studied processes either. To control for such effects, we apply the Theiler’s Amplitude 
Adjusted Fourier Transform (TAAF) [66]. This method reconstructs the series with the 
same spectral as well as distributional properties as the original one. This way, we ob¬ 
tain two series with an unchanged auto-correlation and distributional structure which are, 
however, pairwise uncorrelated. Statistical signihcance of estimated correlations based on 
DCCA and DMCA can be then obtained and tested. 

Specihcally for each studied pair of processes, we obtain TAAF transformed series which 
are not cross-correlated but retain the auto-correlation and distribution properties of the 
original series. The DCCA and DMCA coefficients are then estimated for such series. As 
the series are not cross-correlated, the expected value of the coefficients is zero. However, 
variance of the estimates can be possibly high. Therefore, we estimate the coefficients on 
1000 surrogate series to obtain a hnite sample distribution under the null hypothesis of no 
cross-correlations between series which controls for both long-term auto-correlations and 
distributional properties. 


'^DCCA is a bivariate generalization of DFA presented in the previous section. 
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3. Data and results 

Google provides search query time series for specified terms from the year of 2004 
onwards. However, the series are not reported as a pure number of searches for a given 
term but these are renormalized according to the Google algorithm which can be in essence 
seen as rescaling the searches into the 0-100 interval so that the number represents the 
proportion of the specihed searched term among all searched terms in time being kept 
between 0 and 100. Moreover, the obtained numbers are based on sampling from all 
searched terms so that these represent an estimated rescaled proportion. Even though 
such rescaling procedure can somewhat dilute the information content of the series, the 
empirical results summarized in the introductory section show otherwise. 

The Google data can be downloaded freely from the Google Trends website (trends.google.com) 
at a weekly frequency. To obtain the data at a higher frequency, specihcally the daily one, 
one needs to download the series in three-months sections and the series further need to 
be rescaled and chained together. We apply such procedure for the component stocks of 
the Dow Jones Industrial Average (DJIA) index between years 2004 and 2013 (apart from 
Exxon Mobil, J. P. Morgan and Procter & Gamble for which the series are several months 
shorter which will be evident later in the text) and thus obtaining 2516 observations for 
most series. The most severe issue with the Google queries data is its relative arbitrari¬ 
ness in dehning the searched terms. Further, the sampling and thresholding procedure 
applied by Google for its search series quite frequently ends up with reporting incomplete 
series. If the specihed term is not searched for frequently enough, the series is practically 
useless. We thus analyze only the component stocks which provide reliable search query 
series. Out of 30 DJIA stocks, we end up with 18 stocks for which the Google series are 
reliable without discontinuities. The analyzed stocks are summarized in Tab [Tj We have 
tried various combinations and specihcations of the searched terms and we report the ones 
which provided the most complete series. 

The Google searches for the analyzed stocks are illustrated in Figs. and These 
uncover that the searching frequencies for the component stocks are very heterogenous. The 
trends are sharply decreasing (IBM, Merck, Microsoft), slowly decreasing (3M, Boeing, Du 
Pont, GE, Intel), or reversely increasing rapidly (McDonald’s) or slowly (Gaterpillar, Goca 
Gola, Exxon Mobil, Home Depot), or remains quite stable in time (Johnson & Johnson, 

J. P. Morgan, United Technologies, Walt Disney). Most of the series show strong seasonal 
patterns (hence the choice of the DGGA and DMGA techniques which are constructed for 
such series) mainly connected to the end of the year but also some stronger patterns as for 
Home Depot. The examined dataset thus provides a complex selection of various dynamic 
behaviors. 

Before we get to the estimated values of Hurst exponent and thus to the type of memory 
in question, we hrst test whether the analyzed series are in fact power-law correlated. In 
Tab. we present the testing statistics as well as the corresponding p-values for the 
rescaled range and rescaled variance tests as described in the previous section. Apart from 
two cases (Goca Gola and IBM), the power-law correlations are reported for all series (the 
null hypothesis is rejected by at least one of the tests at at least 10% level). It needs to be 
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stressed that levels of the optimal q parameter climb high for all and very high for some 
cases, sometimes taking into consideration as mnch as 372 lags of the covariance fnnction 
(here specihcally for IBM). This only strengthens the claim that the analyzed Google series 
are long-term correlated. This is dne to the fact that taking into acconnt already tens of 
lags practically means considering long-term memory, even more so for hnndreds of lags. 

Tab. 1^ also reports the estimated Hnrst exponents which are farther snpported by 
Figs. 1^ and In the hgnres, we report an evident power-law scaling of the flnctnation 
fnnctions according to Eq. For all series, the scaling is very stable and the estimated 
Hnrst exponents are thns reliable. Tab. shows that Hnrst exponents vary between 0.8 
and 1.1. The Google searches are thns strongly persistent for all the analyzed series. Even 
thongh the memory is very strong for these series, Hnrst exponents still remain below 
1.5 which implies that the series stay mean reverting. In the DEA context, this means 
that even thongh the online qneries series tend to wander away from the long-term trend, 
they always retnrn to it and they never explode. The fact that the series remain on the 
edge of stationarity and non-stationarity (aronnd H = 1) only highlights the need for a 
carefnl treatment of snch series in mnltivariate settings which are standardly applied in 
the empirical literatnre. 

To farther illnstrate the nsefnlness of the presented results, we reinvestigate the rela¬ 
tionship between Google searches, traded volume and volatility. The traded volume for 
each component stock of the DJIA index is directly available at hnance.yahoo.com as well 
as are the open, close, high and low prices. We utilize the provided information and 
construct volatility series using the Garman-Klass variance estimator IS7I dehned as 


^2 

^GK,t 


{\og{H,/L,)y 

2 


(21og2-l)(log(aM))2 


(4) 


where Ht and Lt are daily highs and lows, respectively, and Ct and Ot are daily closing 
and opening prices, respectively. The estimator possesses very good statistical properties 
and serves as an excellent choice without a need of using high-frequency data |68]. We 
study a logarithmic transformation of both and the traded volume series which is a 
standard procedure in the applied literature. The transformation of the original variance 
series allows us to comment on both variance and volatility as the logarithmic variance 
becomes just twice the logarithmic volatility. 

We examine the correlations between Google searches, traded volume and volatility 
at various scales using the DGGA and DMGA coefficients. For the DGGA coefficient, we 
study the correlation between the searches and traded volume, and between the searches 
and volatility for scales between 10 and 250 with a step of 10. For the DMGA coefficient, 
we use moving window lengths between 11 and 251 with a step of 10 as well. This way, we 
obtain comparable results using these methodologies. 

Figs. 1^ and depict the results for variance and traded volume, respectively, for both 
methods. Only significant correlations with the p-value below 0.10 are reported, the in¬ 
significant ones are set to zero. We find several interesting results. First, the DMGA 
method reports more stable results with more significant coefficients. This is well in hand 
with the numerical results presented by Kristoufek [55l |60] . Second, the correlations for 



traded volume are in general higher than the ones for volatility. Third, a majority of sig- 
nihcant correlations occur at the lower scales. There thus seems to be rather short-term 
or medium-term relationship between the online searches and the examined hnancial in¬ 
dicators. In the long-term, only few correlations are identihed as signihcant. And fourth, 
the level of correlations varies considerably across stock titles. There thus seems to be no 
universality in the relationship between the searches, and volatility and volume. Tables 
and 1^ further illustrate the heterogeneity of the results. There, we present the average 
DCCA and DMCA coefficients across scales together with their signihcance level. The 
above mentioned results are supported. First, the signihcance, level and sign of the corre¬ 
lations vary widely. Second, the DMCA procedure delivers more signihcant results. And 
third, the correlations are higher for volume than for volatility. Nevertheless, many of 
the signihcant correlations are still below a level of 0.05 and practically all the correlation 
coefficients fall between -0.2 and 0.2. The correlations are thus very weak even if found 
statistically signihcant. 

There are still some interesting results mainly connected to various signs of the corre¬ 
lations. For example Microsoft shows some unorthodox behavior for volatility. A positive 
relationship is usually reported, whereas the search queries for Microsoft are negatively 
correlated with volatility. Conversely, traded volume shows a positive correlation. It thus 
seems that general interest in Microsoft is mainly tied with positive news which stabilize 
the stock price rather than with negative news that would make the price more volatile. 
Similar dynamics is found for Johnson & Johnson. The only stock which gives insignihcant 
results for both hnancial quantities is Merck. Other stocks show either positive and thus 
expected correlations or only weak negatives ones. 

4. Discussion and conclusions 

We have analyzed the power-law correlations in the online search queries for the DJIA 
stock components. By reconstructing the daily Google search series, we have been able to 
obtain enough observations for a valid analysis of long-range dependence. Using the com¬ 
bination of the rescaled range and rescaled variance tests and the detrended huctuation 
analysis, we have shown that the online searches are indeed power-law correlated. Impor¬ 
tantly, the level of long-term memory is very high with Hurst exponents around unity (be¬ 
tween 0.8 and 1.1) for all the analyzed stocks. Such results suggest that the hnance-related 
online searches have similar dynamic properties to stock variance and traded volume which 
are themselves power-law correlated. The information flow coming into the stock markets 
evidently enters the general public interest and keeps it for quite long periods and its dis¬ 
sipation is thus not immediate. The fact that the online searches and implied attention 
are usually attributed to retail and small investors, such information and attention dissipa¬ 
tion hts into the picture of a small investor using the information for decision-making in a 
longer term. Such persistent dynamics of the series might also arise from indecisiveness of 
the small investors which would think twice before investing into a specihc stock. Online 
searches then cluster and keep their level for longer time intervals. The results remain 
fascinatingly universal across the analyzed stocks. Even though the global dynamics of the 
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series is very heterogenous with different speeds of trends or various volatility levels, they 
all remain strongly persistent with a smooth scaling of fluctuations. 

In addition, we have studied the relationship between Google searches, traded volume 
and volatility using the recently proposed DCCA and DMCA coefficients. The results 
primarily suggest that there is no universal relationship between the online search queries 
and the analyzed financial measures. Even though we confirm positive correlation for a 
majority of pairs, there are several pairs with insignificant or even negative correlations. 
Further, the correlations vary quite strongly across scales. The online searches have thus 
retained their potential for financial modeling and various applications but our findings 
suggest that one needs to carefully study each stock or asset separately as the usefulness of 
the queries can fluctuate considerably. The reported results do not necessarily contradict 
some previous findings which find statistically significant connections [26l [23 1281 [31] or 
time varying correlations |2S] . However, we stress that there seems to be no universal and 
global relationship between the online searches and relevant financial variables (traded 
volume and volatility). 

Our results open an interesting area of further research of the topic. First, the power-law 
properties of the correlation structure might be observed also in different types of search 
queries, not necessarily only for stocks or financial markets in general. This would show 
how information or information seeking dissipates in time and how such behavior connects 
to other specific phenomena of the relevant time series. Second, the results indicate that 
the online searches are strongly persistent and on the edge of (non-)stationarity. Such char¬ 
acteristic implies that the simple correlation studies reported in the literature should take 
this property into consideration as an inappropriate analysis of persistent data using tools 
for short-range dependent series can produce spurious and in turn misleading results. And 
third, knowing the basic dynamic properties of the series helps to construct the forecasting 
models which are of high interest for practitioners, specifically in risk management. 
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Table 1: Searched terms and DJIA component stocks 


No. 

Company full name 

Company short name 

Ticker 

Search query 

#1 

3M Company 

3M 

MMM 

3M 

#2 

Caterpillar Incorporated 

Caterpillar 

GAT 

Gaterpillar 

#3 

Coca-Cola Company 

Coca Cola 

KO 

Goca Gola 

#4 

E. 1. du Pont de Nemours and Company 

Du Pont 

DD 

DuPont 

#5 

Exxon Mobil Company 

Exxon Mobil 

XOM 

Exxon 

#6 

General Electric Company 

General Electric 

GE 

GE 

#7 

Home Depot Incorporated 

Home Depot 

HD 

Home Depot 

#8 

Intel Corporation 

Intel 

INTG 

Intel 

#9 

International Business Machines 

IBM 

IBM 

IBM 

#10 

J. P. Morgan Chase 

J. P. Morgan 

JPM 

JP Morgan 

#11 

Johnson & Johnson 

Johnson & Johnson 

JNJ 

Johnson Johnson 

#12 

McDonald’s Corporation 

McDonald’s 

MGD 

McDonalds 

#13 

Merck & Co., Inc. 

Merck 

MRK 

Merck 

#14 

Microsoft Corporation 

Microsoft 

MSFT 

Microsoft 

#15 

Procter & Gamble Company 

Procter & Gamble 

PG 

P&G 

#16 

The Boeing Company 

Boeing 

BA 

Boeing 

#17 

United Technologies Corporation 

United Technologies 

UTX 

UTG 

#18 

Walt Disney Company 

Walt Disney 

DIS 

Disney 


Table 2: Long-term memory tests and estimated Hurst exponent 


Gompany 

# of obs. 

Vx 

p-value 

Mx 

p-value 

9 

H-dfa 

3M 

2516 

2.9267 

0.0000 

0.0003 

0.0000 

46 

0.9452 

Boeing 

2516 

2.7083 

0.0000 

0.0003 

0.0000 

65 

0.9086 

Caterpillar 

2516 

2.0131 

0.0120 

0.0001 

0.0030 

33 

1.0467 

Coca Cola 

2516 

1.6006 

0.1848 

0.0001 

0.2647 

23 

0.9016 

Du Pont 

2516 

2.6857 

0.0000 

0.0002 

0.0000 

63 

0.9276 

Exxon Mobil 

2265 

2.8268 

0.0000 

0.0003 

0.0000 

37 

0.9313 

General Electric 

2516 

2.8344 

0.0000 

0.0003 

0.0000 

63 

0.9652 

Home Depot 

2516 

1.6500 

0.0599 

0.0001 

0.1359 

67 

1.1557 

IBM 

2516 

1.2279 

0.6424 

0.0001 

0.1099 

372 

1.1478 

Intel 

2516 

1.7875 

0.0070 

0.0001 

0.0000 

164 

1.1486 

Johnson & Johnson 

2516 

2.5457 

0.0000 

0.0002 

0.0000 

34 

0.9964 

J. P. Morgan 

2393 

2.4226 

0.0000 

0.0002 

0.0000 

26 

0.8261 

McDonald’s 

2516 

2.0612 

0.0000 

0.0002 

0.0000 

127 

1.0373 

Merck 

2516 

2.3400 

0.0000 

0.0002 

0.0000 

88 

0.9314 

Microsoft 

2516 

1.5279 

0.0729 

0.0001 

0.0060 

206 

1.0896 

Procter & Gamble 

2393 

3.2807 

0.0000 

0.0005 

0.0000 

25 

0.9468 

United Technologies 

2516 

2.7528 

0.0000 

0.0002 

0.0070 

22 

0.8747 

Walt Disney 

2516 

3.0381 

0.0000 

0.0004 

0.0000 

38 

1.0003 


16 



Table 3: Average DCCA and DMCA correlations between volatility and Google searches 



Pdcca 

^Pdcca 

p-value 

Pdmca 

^ Pdmca 

p-value 

significant 

sign 

3M 

0.0657 

0.0270 

0.0149 

0.0648 

0.0112 

0.0000 

// 

-k 

Boeing 

-0.0372 

0.0275 

0.1771 

-0.0281 

0.0162 

0.0831 

x/ 

— 

Caterpillar 

0.1087 

0.0231 

0.0000 

0.1136 

0.0086 

0.0000 

// 

-k 

Coca Cola 

0.0315 

0.0214 

0.1422 

0.0369 

0.0093 

0.0001 

x/ 

-k 

Du Pont 

0.1658 

0.0194 

0.0000 

0.1590 

0.0096 

0.0000 

// 

-k 

Exxon Mobil 

0.0432 

0.0170 

0.0111 

0.0411 

0.0101 

0.0000 

// 

-k 

General Electric 

0.1827 

0.0173 

0.0000 

0.2181 

0.0067 

0.0000 

// 

-k 

Home Depot 

-0.0095 

0.0273 

0.7283 

-0.0193 

0.0080 

0.0156 

x/ 

— 

IBM 

0.1093 

0.0247 

0.0000 

0.1154 

0.0047 

0.0000 

// 

-k 

Intel 

0.0287 

0.0252 

0.2536 

0.0302 

0.0125 

0.0160 

x/ 

-k 

Johnson & Johnson 

-0.0953 

0.0308 

0.0020 

-0.1431 

0.0227 

0.0000 

// 

— 

J. P. Morgran 

0.0282 

0.0176 

0.1093 

0.0501 

0.0094 

0.0000 

x/ 

-k 

McDonald’s 

0.1509 

0.0176 

0.0000 

0.1595 

0.0073 

0.0000 

// 

-k 

Merck 

0.0216 

0.0328 

0.5090 

0.0296 

0.0228 

0.1955 

X X 

0 

Microsoft 

-0.1635 

0.0323 

0.0000 

-0.1608 

0.0250 

0.0000 

// 

— 

Procter & Gamble 

-0.0118 

0.0172 

0.4944 

-0.0141 

0.0085 

0.0960 

x/ 

— 

United Technologies 

-0.0813 

0.0259 

0.0017 

-0.1076 

0.0191 

0.0000 

// 

— 

Walt Disney 

0.0841 

0.0195 

0.0000 

0.0509 

0.0052 

0.0000 

// 

-k 


Table 4: Average DCCA and DMCA correlations between traded volume and Google searches 



Pdcca 

^ Pdcca 

p-value 

Pdmca 

^ Pdmca 

p-value 

significant 

sign 

3M 

0.1599 

0.0302 

0.0000 

0.1667 

0.0179 

0.0000 

// 

-k 

Boeing 

-0.0280 

0.0251 

0.2647 

-0.0468 

0.0186 

0.0117 

x/ 

— 

Caterpillar 

0.0785 

0.0266 

0.0032 

0.0848 

0.0144 

0.0000 

// 

-k 

Coca Cola 

0.0331 

0.0230 

0.1508 

0.0436 

0.0135 

0.0013 

x/ 

-k 

Du Pont 

0.1181 

0.0184 

0.0000 

0.1051 

0.0148 

0.0000 

// 

-k 

Exxon Mobil 

0.0936 

0.0183 

0.0000 

0.0889 

0.0073 

0.0000 

// 

-k 

General Electric 

0.1093 

0.0218 

0.0000 

0.1994 

0.0056 

0.0000 

// 

-k 

Home Depot 

0.0262 

0.0341 

0.4423 

0.0382 

0.0274 

0.1638 

X X 

0 

IBM 

0.1547 

0.0151 

0.0000 

0.1407 

0.0093 

0.0000 

// 

-k 

Intel 

0.0357 

0.0254 

0.1609 

0.0345 

0.0096 

0.0003 

x/ 

-k 

Johnson & Johnson 

0.0211 

0.0254 

0.4066 

-0.0172 

0.0173 

0.3192 

X X 

0 

J. P. Morgran 

0.1258 

0.0195 

0.0000 

0.1008 

0.0077 

0.0000 

// 

-k 

McDonald’s 

0.2032 

0.0185 

0.0000 

0.2168 

0.0133 

0.0000 

// 

-k 

Merck 

0.0463 

0.0447 

0.3009 

0.0328 

0.0318 

0.3022 

X X 

0 

Microsoft 

0.0993 

0.0248 

0.0001 

0.0937 

0.0087 

0.0000 

// 

-k 

Procter & Gamble 

0.0724 

0.0212 

0.0007 

0.0865 

0.0199 

0.0000 

// 

-k 

United Technologies 

-0.0856 

0.0238 

0.0003 

-0.0809 

0.0156 

0.0000 

// 

— 

Walt Disney 

0.2297 

0.0437 

0.0000 

0.1317 

0.0209 

0.0000 

// 

-k 
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Figure 1: Normalized Google searches (Part 1). Covered period ranges between 1.1.2004 and 
31.12.2013 with a daily frequency. 
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Figure 2: Normalized Google searches (Part 2) 
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Figure 3: DFA scaling of Google searches related to the DJIA component stocks (Part 1). 

Log-log representation shows a profound linear scaling characteristic for long-range correlated processes. 
Estimated Hurst exponents are summarized in Tab. 
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DCCA correlation between Google 
searches and variance 
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Figure 5: Correlation coefficients between Google searches and variance. The correlations are 
presented for the DCCA (left) and DMCA (right) methods with changing scales and moving average 
windows, respectively. The results are shown for all analyzed series. Only significant correlations (with 
p-value below 0.1) are reported. 
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DMCA correlation between Google 
searches and traded volume 
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Figure 6: Correlation coefficients between Google searches and traded volnme. The notation 
holds from Fig. 
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